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The killer cell immunoglobulin-like receptors (KIR) play a fundamental role in the innate immune system, through their 
interactions with human leucocyte antigen (HLA) molecules, leading to the modulation of activity in natural killer (NK) 
cells, mainly related to killing pathogen-infected cells. KIR genes are hugely polymorphic both in the number of genes an 
individual carries and in the number of alleles identified. We have previously developed the Allele Frequency Net Database 
(AFND, http://www.allelefrequencies.net), which captures worldwide frequencies of alleles, genes and haplotypes for sev- 
eral immune genes, including KIR genes, in healthy populations, covering >4 million individuals. Here, we report the 
creation of a new database within AFND, named KIR and Diseases Database (KDDB), capturing a large quantity of data 
derived from publications in which KIR genes, alleles, genotypes and/or haplotypes have been associated with infectious 
diseases (e.g. hepatitis C, HIV, malaria), autoimmune disorders (e.g. type I diabetes, rheumatoid arthritis), cancer and 
pregnancy-related complications. KDDB has been created through an extensive manual curation effort, extracting data 
on more than a thousand KIR-disease records, comprising >50 000 individuals. KDDB thus provides a new community 
resource for understanding not only how KIR genes are associated with disease, but also, by working in tandem with 
the large data sets already present in AFND, where particular genes, genotypes or haplotypes are present in worldwide 
populations or different ethnic groups. We anticipate that KDDB will be an important resource for researchers working in 
immunogenetics. 

Database URL: http://www.allelefrequencies.net/diseases/ 



Introduction 

Natural killer (NK) cells are bone marrow-derived lympho- 
cytes that play an active role in the innate immune system 
by interacting with human leucocyte antigen (HLA) class I 



molecules to kill pathogen-infected cells (1). Initially, NK 
cells were discovered as a result of their ability to target 
and kill tumour cell lines that expressed little or no HLA 
class I molecules (2). It is now known that the killing func- 
tion in NK cells is dependent on a mixture of activating and 
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inhibitory receptors present on the membrane and the 
interaction with their HLA ligand (3). Two main types of 
receptors are found in NK cells, C-type lectin-like (NKG2D, 
CD94/NKG2C, CD94/NKG2A) and the immunoglobulin-like 
superfamily (KIR, CD16, NKp30, NKp44, etc.). In the latter, 
the killer cell immunoglobulin-like receptors (KIR) that 
mostly bind Major histocompatibility complex (MHC) class 
I molecules have been shown to be the most polymorphic. 
Despite most of the NK cell receptors binding MHC class I- 
related molecules, several Ig-like receptors bind non-HLA 
ligands, for example, CD16 binds IgG, triggering an activat- 
ing response, and NKp44, NKp30 and NKp46 are activating 
receptors that bind molecules expressed by pathogens and 
self-ligands (4-8). 

The KIR gene cluster is located in the leucocyte receptor 
complex (LRC) at position 19q13.4 (4, 5). To date, 16 KIR 
genes have been identified, coding for receptors with acti- 
vating (KIR2DS1, KIR2DS2, KIR2DS3, KIR2DS4, KIR2DS5A/B 
and KIR3DS1) or inhibitory (KIR2DL1, KIR2DL2, KIR2DL3, 
KIR2DL5A, KIR2DL5B, KIR3DL1, KIR3DL2 and KIR3DL3) func- 
tion, with KIR2DL4 appearing to have both functions. Two 
pseudogenes KIR2DP1 and KIR3DP1 have also been identi- 
fied (9). Structurally, the activating and inhibitory functions 
of KIR are related to the length of their cytoplasmic tail 
that can be short (S) or long (L), distinguished in the no- 
menclature (9). 

Variation in KIR can result from a different gene and/or 
allele content of an individual (10), giving rise to haplotype 
diversity and leading to a very large number of different 
genotypes that have been observed (presence/absence of 
KIR genes). The KIR genes KIR2DL4, KIR3DL2, KIR3DL3 and 
KIR3DP1 are present in nearly all individuals with a few 
exceptions (11), and are commonly known as 'framework' 
genes. The frequencies of inhibitory and activating genes 
vary in different populations, as reviewed in (11). A 24-kb 
band using Hindlll digestion and Southern blot analysis dis- 
tinguishes the haplotypes, termed A and B, that make up 
the genotype (12). The A haplotype is generally non-vari- 
able in its gene content — framework genes plus KIR2DL1, 
KIR2DL3, KIR2DS4 and KIR3DL 1— although occasionally one 
of these genes may be missing (11). In contrast, the B haplo- 
type contains one or more of the genes encoding activating 
K\Rs—KIR2DS1/2/3/5 and KIR3DS1— and the genes encoding 
inhibitory KIRs— KIR2DL5A/B and KIR2DL2. In B haplotypes, 
variability is created by both the presence/absence of a 
gene and by allelic variation; in contrast, A haplotypes 
owe much of their variability to allele content (11). At the 
last release of IPD-KIR (Release 2.4.0), there were 601 KIR 
alleles reported (13). B haplotypes tend to be more preva- 
lent in non-Caucasian populations, such as Australian 
Aborigines and Asian Indians, whereas in Caucasian popu- 
lations, ~55% will have one and 30% two A haplotypes 
(14, 1 5). It is thought that populations with higher frequen- 
cies of B haplotypes are those under strong pressure from 



infectious diseases. Such extensive diversity among modern 
populations may indicate that geographically distinct dis- 
eases have exerted recent or perhaps on-going selection on 
KIR repertoires. From a practical viewpoint, this makes the 
choice of controls very important for all disease association 
studies. 

To collect allele, haplotype and genotype frequencies of 
several immune genes in different healthy human popula- 
tions, the Allele Frequency Net Database (AFND) was de- 
veloped (16). AFND stores large sets of data regarding HLA, 
KIR major histocompatibility complex class I chain related 
(MIC) and cytokine gene polymorphisms, and has shown to 
be frequently used in the immunogenetics field, receiving 
200 hits per day on average. To date, 398 different KIR 
genotypes in 12 856 individuals from 109 populations 
have been reported to AFND. 

Owing to its high level of polymorphism, many infectious 
and autoimmune diseases have been associated with KIR 
genes in different ways, e.g. associations with single 
genes (or single alleles) to associations with groups of 
genes and full genotypes (17-20). A disease association is 
defined as a statistically significant association between a 
genetic element (gene, allele, genotype, etc.) with a given 
disease outcome, either positive or negative i.e. the genetic 
profile makes the disease more likely/severe or less likely/ 
severe than the control population. As such, the develop- 
ment of a database to store data regarding disease associ- 
ations with those genes is a necessary step towards a more 
effective comprehension of such complex data. As KIR dis- 
ease association studies are in its infancy compared with 
HLA, a decision was made to start collecting KIR disease 
associations, as a new module within AFND. 

Materials and methods 

Data curation 

The first step towards creation of KDDB was the collection 
and extraction of data from peer-reviewed publications, 
following the workflow shown in Figure 1. Published KIR 
and disease association studies were extracted from the 
HuGE Navigator (version 2.0) (21), which is a web-based 
tool enabling searches of the scientific literature for studies 
on genetic associations with diseases. The HuGE Navigator 
makes use of the MeSH (Medical Subject Headings) termin- 
ology, which contains standardized keywords associated 
with clinically related published studies. In KDDB, we 
loaded MeSH terms that describe specific diseases with 
which associations have been found. Manual curation was 
performed to extract relevant data from retrieved studies. 
A set of consistent rules were applied to ensure that differ- 
ent curators extracted data in the same way (Figure 1). All 
studies identified based on the relevant MeSH terms were 
analysed and inserted into KDDB unless they did not pass 
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Figure 1. The data curation pipeline, the types of data that were extracted from each publication and the submission workflow 
developed within KDDB. 



one of the following criteria (as also shown on Figure 1): 
(i) the article was not written in English, as we do not have 
the capability to translate articles at present, (ii) the study 
design was not based on a gene frequency comparison be- 
tween two samples with different clinical outcomes (future 
updates to KDDB will attempt to include more complex 



study designs), (iii) the article identified by the HuGE 
Navigator was not in fact related to KIR (i.e. misidentified), 
(iv) the study was not related to a disease specifically, but 
instead describe transplantation outcomes. Studies asso- 
ciating transplantation outcome and KIR have heteroge- 
neous designs — some studies associate KIR-ligand 
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Figure 2. Screenshots of the data submission pipeline within KDDB. 



matches/mismatches based on recipients and donors sam- 
ples, and others correlate the risk of relapse with KIR com- 
binations. These study designs are under evaluation to 
ascertain whether they can either fit the existing KDDB 
schema, or will be stored in a different database schema 
in the future. A data validation pipeline was created to 
ensure that quantitative data and metadata had been cor- 
rectly extracted from each publication, involving two cur- 
ators reviewing the same source publication to reduce the 
chance for misinterpretation or copy-paste errors. 

Implementation 

The back end of the database was developed using a rela- 
tional database schema. Users can connect to the database 
using the most common web browsers. The web interface 
of KDDB has been created to allow users to retrieve or 
query the database and to submit new data sets. For that 
purpose, interactive web pages for querying and submit- 
ting data were developed using the Active Server Pages 
(ASP) scripting environment and JavaScript language. The 
graphical display was designed using HyperText Markup 
Language (HTML) and Cascading Style Sheets (CSS), ensur- 
ing that the page will be viewable in most used web 



browsers. The data submission pipeline will be an import- 
ant feature for future updates to KDDB, as we recognize 
the benefits of obtaining community input, including un- 
published data sets (see Discussion). 

To submit studies to KDDB, a submission form pipeline 
was developed, which can be accessed through the AFND 
homepage by the menu 'Submissions' and the submenu 
'Add KIR and disease association study' (Figure 2). This 
web form consists of four steps. The first step captures sum- 
mary information about the study including the number of 
patients and controls. Information is also captured on the 
geographic location of the population, the ethnicity and 
the bibliographic reference. The second step captures the 
disease association data — the genes, alleles, haplotypes, 
KIR-HLA ligands, etc., the disease name, the frequency of 
patients and controls exhibiting the given genetic profile 
and the results of the statistical test. The third step 
(optional) allows users to upload anonymized raw data 
(the KIR genetic profile of every individual in the study). 
The fourth step allows users to review their data and 
submit. We anticipate that the submission pipeline will 
become an important tool for users to submit their own un- 
published studies or studies missed in the curation process. 
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Table 1. Summary of data stored in the KDDB 



Disease type 


No. of studies 


No. ot records (T/5) 


No. of individuals 


Infectious 


36 


145/145 


15813 


Autoimmune or Idiopathic 


61 


589/274 


30888 


Neoplasias 


9 


135/47 


5791 


Pregnancy related 


11 


167/39 


4879 


Total 


113 a 


1027/496 3 


56214 a 



a Some of the studies fall into more than one disease type category, e.g. tumours originated 
from viral infections. 

T, total records; S, significant associations. 
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Figure 3. The query interface within KDDB, showing the additional detail about a given association study retrieved by following 
the hyperlink. 



Results 

Website organization and content 

Using the HuGE Literature Finder tool, 159 articles re- 
mained after applying the exclusion criteria detailed in 
Figure 1. From all the articles, a total of 1027 KIR-disease 
associations were captured from 113 articles. A set of 46 
articles was removed at this stage owing to studies lacking 
mandatory data/metadata or the numerical data were in- 
accessible, for example displayed only on charts. The gen- 
etic associations identified in this data compilation included 



those with single KIR genes, profiles of combined KIR genes 
and / or HLA class I ligands, and full KIR genotypes. In total, 
70 unique MeSH terms have been associated with KIR 
across the studies in the present database. Classifying the 
studies by the main disease groups, 36 studies are related to 
infectious diseases, 61 studies are related to autoimmune or 
idiopathic diseases, 11 studies are related to pregnancy 
and 9 articles are related to cancer (Table 1). From these 
studies, a total of 1027 KIR records were inserted into 
KDDB, of which 496 are statistically significant KIR-disease 
associations. 
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The KIR and Diseases Database is part of Allele Frequen- 
cies Net Database, and can be accessed through Allele Fre- 
quencies Net homepage (http://www.allelefrequencies.net/) 
using the menu 'KIR' and the submenu 'KIR and disease as- 
sociations' or via a direct URL access at http://www.al lelef re 
quencies.net/diseases/. The website interface allows the user 
to retrieve and query KIR and disease associations applying a 
collection of filters. The user can restrict the search by gene 
or allele, country of origin of studied samples, continent of 
origin of studied samples or studied disease. Those filters can 
be applied alone or used in combination. 

Results from a query are retrieved in a table format, with 
each row being a different disease association with KIR 
(Figure 3). In each row, the following information is dis- 
played: (i) row number, (ii) the associated MeSH term, 
(iii) the country of origin of the sample, (iv) the associated 
KIR profile, (v) the sample size and gene frequencies for 
controls and patients, (vi) odds ratio value, (vii) P-value 
and (viii) statistical method used in comparisons. A link is 
provided, by clicking on the population name, to show the 
demographic information on the disease and correspond- 
ing control populations. As for normal populations in 
AFND, individual KIR gene frequencies or haplotype fre- 
quencies can be plotted on world maps. This enables a 
user to interpret disease association risks for KIR profiles 
in a geographic, ethnic group or individual population- 
based context. Additional functionality is under develop- 
ment for linking to external resources including to the 
IPD-KIR database (www.ebi.ac.uk/ipd/kir/), where the se- 
quences and official nomenclature are maintained. 

Discussion 

In our original search for frequency data in AFND in normal 
populations, we sourced publication data from >65 peer- 
reviewed journals — a complete list of data sets and journals 
may be consulted at http://www.allelefrequencies.net/ 
datasets.asp. However, many disease studies, especially 
those that do not find statistically significant associations, 
are not published, and there is a risk that resources such as 
KDDB could suffer from publication bias. As such, we are 
contacting colleagues working in this field with a request 
to provide their data, even if it is unpublished or does 
not contain a statistically significant association. As unpub- 
lished studies are added to KDDB, we will add a filter to the 
query page allowing users to exclude these data sets if they 
wish to ensure quality control. We are also requesting users 
to upload anonymized raw data (individual KIR type and 
HLA ligands) to enable improved quality control measures 
(such as validation of frequency calculations) and to enable 
advanced analyses of the data. For example, having the 
individual data available will allow analyses such as looking 
at disease associations in the centromeric or the telomeric 
regions. It is known there is extensive linkage 



disequilibrium between KIR genes, but this exists separately 
in the centromeric half and the telomeric half (22). There is 
little linkage disequilibrium between the two halves, and 
the genes KIR3DP1 and KIR2DL4 are at the division be- 
tween centromeric and telomeric sections. 

We already have some associations in KDDB derived 
from the presence of the KIR gene and its HLA ligand, 
and it will be important to expand this collection and in- 
clude raw data. Studies have shown that although KIR and 
HLA genes are coded on different chromosomes, there are 
correlations (both negative and positive) between the pres- 
ence of the KIR gene and corresponding presence/absence 
of the ligand (23, 24). These correlations have been shown 
to be important in diseases. For example, a reciprocal rela- 
tionship exists in populations between the frequencies of 
the KIR A haplotype and the HLA-C2 group. This is believed 
to be due to an increased risk of pre-eclampsia when the 
mother lacks the AA haplotype and the foetus carried the 
HLA-C2 group (25). Further, KIR2DL3 was found to be asso- 
ciated with the development of cerebral malaria when the 
HLA-C1 ligand is present (26). 

The first release of KDDB reported here includes 
only data we have extracted and curated from the scien- 
tific literature, identified by the HuGE Navigator. We are 
aware that the HuGE Navigator does not retrieve all studies 
and as such we are using other search strategies, for ex- 
ample via Pubmed and Web of Knowledge to locate studies 
missed in the first pass curation process. We have currently 
excluded studies that do not fit into the simple model of a 
case-control disease association study. Capturing more 
complex stratification studies is possible in KDDB, but 
will necessitate either some loss of granularity of the 
data, or the development of a much more complex 
schema and display interface. KDDB also does not yet con- 
tain any raw data, although the schema and submission 
pipeline are developed and tested to receive such data. 
KDDB is going to be maintained through our own data 
mining and curation efforts and through the submission 
of data from contributing laboratories (with suitable 
quality control procedures, as currently used in AFND). 
We are also exploring holding community workshops in 
the future to collect and collate data sets not yet in the 
public domain. 

At present, we are not aware of any other site designed 
for public deposition of the raw data associated with immu- 
nogenetic disease association studies, and thus, these are 
not available for public analysis. The release of KDDB pro- 
vides a new home for this raw data, and we encourage 
research groups that have published studies in the past, 
or those in the process of publishing new studies, to deposit 
the raw data within KDDB. We also encourage feedback 
from the scientific community on the utility of the data 
submission and query interface and the general approach 
we have taken to curation. 
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Conclusions 

Over the last 10 years of existence, AFND has provided the 
immunogenetics and histocompatibility community with an 
online repository for the examination of frequencies in dif- 
ferent healthy populations. With the development of the 
KDDB, our aim is to cover disease studies that have been 
associated with KIR genes and to include studies in which 
no significant association has been found, to avoid publica- 
tion bias. In the future, we will extend the alleles covered to 
include other loci and new data sets as they are published. 
We anticipate that KDDB will greatly facilitate meta-ana- 
lyses and data re-use to understand the underlying function 
of KIR genes in a variety of disease processes. 
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